A Study of Association Measures and their Combination for Arabic MWT Extraction

نویسندگان

  • Abdelkader El Mahdaouy
  • Saïd El Alaoui Ouatik
  • Éric Gaussier
چکیده

Automatic Multi-Word Term (MWT) extraction is a very important issue to many applications, such as information retrieval, question answering, and text categorization. Although many methods have been used for MWT extraction in English and other European languages, few studies have been applied to Arabic. In this paper, we propose a novel, hybrid method which combines linguistic and statistical approaches for Arabic Multi-Word Term extraction. The main contribution of our method is to consider contextual information and both termhood and unithood for association measures at the statistical filtering step. In addition, our technique takes into account the problem of MWT variation in the linguistic filtering step. The performance of the proposed statistical measure (NLC-value) is evaluated using an Arabic environment corpus by comparing it with some existing competitors. Experimental results show that our NLC-value measure outperforms the other ones in term of precision for both bi-grams and tri-grams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptation of a clinical practice guideline for Maintenance of Wakefulness Test (MWT)

Abstract: Introduction: Excessive daytime sleepiness is related to significant morbidity and mortality in either the patients or others. Maintenance of Wakefulness Test (MWT) is designed to evaluate the ability to maintenance of wakefulness in standard conditions of the test. Regarding the diversity of MWT protocols, and the importance of unification of them in Iran, this clinical practice gui...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Dry Socket following Tooth Extraction in an Iranian Dental Center: Incidence and Risk Factors

Introduction: Dry Socket (DS) is a common post-surgical complication following extraction of permanent teeth. Various risk factors has been mentioned for this complication including gender, age, amount of trauma during extraction, difficulty of extraction, inappropriate irrigation, infection, smoking, and oral contraceptive use. The aim of this study was to evaluate the incidence of DS among pe...

متن کامل

Partial Association Components in Multi-way Contingency Tables and Their Statistiical Analysis

In analyses of contingency tables made up of categorical variables, the study of relationship between the variables is usually the major objective. So far, many association measures and association models have been used to measure  the association structure present in the table. Although the association measures merely determine the degree of strength of association between the study varia...

متن کامل

Study of the metabolic profile of Papaver extracts by chromatographic and chemometrics methods

Background and objectives: Chromatography fingerprinting is considered as a comprehensive method for quality control, diagnosis and the nature of herbal drugs, and it is important to classify the different samples of medicinal plants and determine the chemical species present in them. Methods: In this research, a new strategy based on the combination of multiva...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1409.3005  شماره 

صفحات  -

تاریخ انتشار 2014